40 research outputs found

    Linking the Resource Description Framework to cheminformatics and proteochemometrics

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Semantic web technologies are finding their way into the life sciences. Ontologies and semantic markup have already been used for more than a decade in molecular sciences, but have not found widespread use yet. The semantic web technology Resource Description Framework (RDF) and related methods show to be sufficiently versatile to change that situation.</p> <p>Results</p> <p>The work presented here focuses on linking RDF approaches to existing molecular chemometrics fields, including cheminformatics, QSAR modeling and proteochemometrics. Applications are presented that link RDF technologies to methods from statistics and cheminformatics, including data aggregation, visualization, chemical identification, and property prediction. They demonstrate how this can be done using various existing RDF standards and cheminformatics libraries. For example, we show how IC<sub>50</sub> and K<it><sub>i</sub></it> values are modeled for a number of biological targets using data from the ChEMBL database.</p> <p>Conclusions</p> <p>We have shown that existing RDF standards can suitably be integrated into existing molecular chemometrics methods. Platforms that unite these technologies, like Bioclipse, makes this even simpler and more transparent. Being able to create and share workflows that integrate data aggregation and analysis (visual and statistical) is beneficial to interoperability and reproducibility. The current work shows that RDF approaches are sufficiently powerful to support molecular chemometrics workflows.</p

    The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching

    Get PDF
    open access articleBackground: The Chemistry Development Kit (CDK) is a widely used open source cheminformatics toolkit, providing data structures to represent chemical concepts along with methods to manipulate such structures and perform computations on them. The library implements a wide variety of cheminformatics algorithms ranging from chemical structure canonicalization to molecular descriptor calculations and pharmacophore perception. It is used in drug discovery, metabolomics, and toxicology. Over the last 10 years, the code base has grown significantly, however, resulting in many complex interdependencies among components and poor performance of many algorithms. Results: We report improvements to the CDK v2.0 since the v1.2 release series, specifically addressing the increased functional complexity and poor performance. We first summarize the addition of new functionality, such atom typing and molecular formula handling, and improvement to existing functionality that has led to significantly better performance for substructure searching, molecular fingerprints, and rendering of molecules. Second, we outline how the CDK has evolved with respect to quality control and the approaches we have adopted to ensure stability, including a code review mechanism. Conclusions: This paper highlights our continued efforts to provide a community driven, open source cheminformatics library, and shows that such collaborative projects can thrive over extended periods of time, resulting in a high-quality and performant library. By taking advantage of community support and contributions, we show that an open source cheminformatics project can act as a peer reviewed publishing platform for scientific computing software

    Open Data, Open Source and Open Standards in chemistry: The Blue Obelisk five years on

    Get PDF
    RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are.Abstract Background The Blue Obelisk movement was established in 2005 as a response to the lack of Open Data, Open Standards and Open Source (ODOSOS) in chemistry. It aims to make it easier to carry out chemistry research by promoting interoperability between chemistry software, encouraging cooperation between Open Source developers, and developing community resources and Open Standards. Results This contribution looks back on the work carried out by the Blue Obelisk in the past 5 years and surveys progress and remaining challenges in the areas of Open Data, Open Standards, and Open Source in chemistry. Conclusions We show that the Blue Obelisk has been very successful in bringing together researchers and developers with common interests in ODOSOS, leading to development of many useful resources freely available to the chemistry community.Peer Reviewe

    Ligand-based Methods for Data Management and Modelling

    No full text
    Drug discovery is a complicated and expensive process in the billion dollar range. One way of making the drug development process more efficient is better information handling, modelling and visualisation. The majority of todays drugs are small molecules, which interact with drug targets to cause an effect. Since the 1980s large amounts of compounds have been systematically tested by robots in so called high-throughput screening. Ligand-based drug discovery is based on modelling drug molecules. In the field known as Quantitative Structure–Activity Relationship (QSAR) molecules are described by molecular descriptors which are used for building mathematical models. Based on these models molecular properties can be predicted and using the molecular descriptors molecules can be compared for, e.g., similarity. Bioclipse is a workbench for the life sciences which provides ligand-based tools through a point and click interface.  The aims of this thesis were to research, and develop new or improved ligand-based methods and open source software, and to work towards making these tools available for users through the Bioclipse workbench. To this end, a series of molecular signature studies was done and various Bioclipse plugins were developed. An introduction to the field is provided in the thesis summary which is followed by five research papers. Paper I describes the Bioclipse 2 software and the Bioclipse scripting language. In Paper II the laboratory information system Brunn for supporting work with dose-response studies on microtiter plates is described. In Paper III the creation of a molecular fingerprint based on the molecular signature descriptor is presented and the new fingerprints are evaluated for target prediction and found to perform on par with industrial standard commercial molecular fingerprints. In Paper IV the effect of different parameter choices when using the signature fingerprint together with support vector machines (SVM) using the radial basis function (RBF) kernel is explored and reasonable default values are found. In Paper V the performance of SVM based QSAR using large datasets with the molecular signature descriptor is studied, and a QSAR model based on 1.2 million substances is created and made available from the Bioclipse workbench

    Ligand-based Methods for Data Management and Modelling

    No full text
    Drug discovery is a complicated and expensive process in the billion dollar range. One way of making the drug development process more efficient is better information handling, modelling and visualisation. The majority of todays drugs are small molecules, which interact with drug targets to cause an effect. Since the 1980s large amounts of compounds have been systematically tested by robots in so called high-throughput screening. Ligand-based drug discovery is based on modelling drug molecules. In the field known as Quantitative Structure–Activity Relationship (QSAR) molecules are described by molecular descriptors which are used for building mathematical models. Based on these models molecular properties can be predicted and using the molecular descriptors molecules can be compared for, e.g., similarity. Bioclipse is a workbench for the life sciences which provides ligand-based tools through a point and click interface.  The aims of this thesis were to research, and develop new or improved ligand-based methods and open source software, and to work towards making these tools available for users through the Bioclipse workbench. To this end, a series of molecular signature studies was done and various Bioclipse plugins were developed. An introduction to the field is provided in the thesis summary which is followed by five research papers. Paper I describes the Bioclipse 2 software and the Bioclipse scripting language. In Paper II the laboratory information system Brunn for supporting work with dose-response studies on microtiter plates is described. In Paper III the creation of a molecular fingerprint based on the molecular signature descriptor is presented and the new fingerprints are evaluated for target prediction and found to perform on par with industrial standard commercial molecular fingerprints. In Paper IV the effect of different parameter choices when using the signature fingerprint together with support vector machines (SVM) using the radial basis function (RBF) kernel is explored and reasonable default values are found. In Paper V the performance of SVM based QSAR using large datasets with the molecular signature descriptor is studied, and a QSAR model based on 1.2 million substances is created and made available from the Bioclipse workbench

    Ligand-based Methods for Data Management and Modelling

    No full text
    Drug discovery is a complicated and expensive process in the billion dollar range. One way of making the drug development process more efficient is better information handling, modelling and visualisation. The majority of todays drugs are small molecules, which interact with drug targets to cause an effect. Since the 1980s large amounts of compounds have been systematically tested by robots in so called high-throughput screening. Ligand-based drug discovery is based on modelling drug molecules. In the field known as Quantitative Structure–Activity Relationship (QSAR) molecules are described by molecular descriptors which are used for building mathematical models. Based on these models molecular properties can be predicted and using the molecular descriptors molecules can be compared for, e.g., similarity. Bioclipse is a workbench for the life sciences which provides ligand-based tools through a point and click interface.  The aims of this thesis were to research, and develop new or improved ligand-based methods and open source software, and to work towards making these tools available for users through the Bioclipse workbench. To this end, a series of molecular signature studies was done and various Bioclipse plugins were developed. An introduction to the field is provided in the thesis summary which is followed by five research papers. Paper I describes the Bioclipse 2 software and the Bioclipse scripting language. In Paper II the laboratory information system Brunn for supporting work with dose-response studies on microtiter plates is described. In Paper III the creation of a molecular fingerprint based on the molecular signature descriptor is presented and the new fingerprints are evaluated for target prediction and found to perform on par with industrial standard commercial molecular fingerprints. In Paper IV the effect of different parameter choices when using the signature fingerprint together with support vector machines (SVM) using the radial basis function (RBF) kernel is explored and reasonable default values are found. In Paper V the performance of SVM based QSAR using large datasets with the molecular signature descriptor is studied, and a QSAR model based on 1.2 million substances is created and made available from the Bioclipse workbench

    JÀmförelsestudie av riskbedömningar avseende vÀgtransport av farligt gods

    No full text
    Risk assessment regarding transportation of hazardous material is a central decision support in urban planning close transportation routes for hazardous material in Sweden. Risk assessment regarding transportation of hazardous is also uncertain and can give various results. In order to highlight this a comparative study is made to describe the differences in risk assessments both regarding implementation and application. The effects of these differences in the risk assessment will also be discussed, concerning how it will affect societal planning in Sweden. One important conclusion is that there are big differences when assessing accident scenarios with hazardous material in class 1, 2.1, 2.3 and 5. There are also differences in assumptions when calculating societal risk that may affect the result and risk consideration in societal planning.Farligt gods definieras som ett Àmne eller föremÄl som pÄ grund av dess beskaffenheter kan orsaka skada pÄ person, egendom och miljö, exempelvis bensin, gasol och svavelsyra. Farligt gods transporteras pÄ vÀgar genom hela landet för att möjliggöra industriprocesser eller andra samhÀllsnyttiga funktioner. För att garantera att bebyggelse intill dessa vÀgar inte Àr utsatta för oacceptabla risker genomförs riskbedömningar med avseende pÄ transport av farligt gods. I Sverige rÄder det en omfattande bostadsbrist och som en följd av detta jobbar storstadsregioner med förtÀtning och utbyggnad av stadsmiljön. Som en del av detta exploateras omrÄden intill vÀgar dÀr farligt gods transporteras i allt högre utstrÀckning vilket innebÀr att resultatet frÄn riskbedömningarna Àr viktigare Àn nÄgonsin för stadsplaneringen. En riskbedömning syftar till att undersöka och bedöma om risken Àr acceptabel eller ej. Begreppet risk har mÄnga möjliga definitioner. En frekvent anvÀnd definition i samband med sÀkerhetsrisker Àr att en risk definieras som en sammanvÀgning av konsekvensen till följd av en hÀndelse och frekvensen av, eller sannolikheten för, att hÀndelsen intrÀffar. Denna utgÄngspunkt anvÀnds ocksÄ i riskbedömningar avseende transport av farligt gods. Att uppskatta frekvensen och konsekvensen av en olycka med farligt gods Àr förknippat med stora osÀkerheter, dÄ det ofta Àr svÄrt att uppskatta hÀndelser med lÄg sannolikhet och potentiellt stora konsekvenser. Det finns Àven litteratur som visar pÄ att resultatet i riskbedömningar och riskanalyser i mÄnga fall Àr beroende pÄ vem som utför dessa. För att belysa den eventuella problematik som finns avseende utformning och genomförandet av riskbedömningar genomförs en jÀmförelsestudie av riskbedömningar avseende transport av farligt gods. I jÀmförelsestudien görs en systematisk genomgÄng av de olika delmomenten i riskbedömningarna för att identifiera de parametrar eller moment som kan leda till stora skillnader i resultat. Resultatet frÄn jÀmförelsestudien visar att det finns betydande skillnader mellan riskbedömningarna. De mest betydelsefulla skillnaderna utgörs av frekvens- och konsekvensberÀkningar vid explosion, BLEVE, gasmolnsexplosion och giftigt gasmoln. Dessa skillnader innebÀr i sin tur att stadsplaneringen kan pÄverkas av vem som utför riskbedömningen vilket potentiellt leder till en snedfördelning av risker i samhÀllet. För att kunna hantera dessa skillnader belyses de delmoment med störst pÄverkan pÄ resultatet och med störst variation. För att driva utvecklingen framÄt och hantera denna variation, föreslÄs en rad möjliga ÄtgÀrder

    Towards agile large-scale predictive modelling in drug discovery with flow-based programming design principles

    Get PDF
    Predictive modelling in drug discovery is challenging to automate as it often contains multiple analysis steps and might involve cross-validation and parameter tuning that create complex dependencies between tasks. With large-scale data or when using computationally demanding modelling methods, e-infrastructures such as high-performance or cloud computing are required, adding to the existing challenges of fault-tolerant automation. Workflow management systems can aid in many of these challenges, but the currently available systems are lacking in the functionality needed to enable agile and flexible predictive modelling. We here present an approach inspired by elements of the flow-based programming paradigm, implemented as an extension of the Luigi system which we name SciLuigi. We also discuss the experiences from using the approach when modelling a large set of biochemical interactions using a shared computer cluster

    Bioclipse poster 2008

    No full text
    Poster for the Bioclipse software from the 8th Swedish Bioinformatics Workshop - Uppsala, Sweden in 200
    corecore